11 research outputs found

    Recent advances in inferring viral diversity from high-throughput sequencing data

    Get PDF
    Rapidly evolving RNA viruses prevail within a host as a collection of closely related variants, referred to as viral quasispecies. Advances in high-throughput sequencing (HTS) technologies have facilitated the assessment of the genetic diversity of such virus populations at an unprecedented level of detail. However, analysis of HTS data from virus populations is challenging due to short, error-prone reads. In order to account for uncertainties originating from these limitations, several computational and statistical methods have been developed for studying the genetic heterogeneity of virus population. Here, we review methods for the analysis of HTS reads, including approaches to local diversity estimation and global haplotype reconstruction. Challenges posed by aligning reads, as well as the impact of reference biases on diversity estimates are also discussed. In addition, we address some of the experimental approaches designed to improve the biological signal-to-noise ratio. In the future, computational methods for the analysis of heterogeneous virus populations are likely to continue being complemented by technological developments.ISSN:0168-170

    V-pipe: a computational pipeline for assessing viral genetic diversity from high-throughput sequencing data

    No full text
    High-throughput sequencing technologies are used increasingly, not only in viral genomics research but also in clinical surveillance and diagnostics. These technologies facilitate the assessment of the genetic diversity in intra-host virus populations, which affects transmission, virulence, and pathogenesis of viral infections. However, there are two major challenges in analysing viral diversity. First, amplification and sequencing errors confound the identification of true biological variants, and second, the large data volumes represent computational limitations. To support viral high-throughput sequencing studies, we developed V-pipe, a bioinformatics pipeline combining various state-of-the-art statistical models and computational tools for automated end-to-end analyses of raw sequencing reads. V-pipe supports quality control, read mapping and alignment, low-frequency mutation calling, and inference of viral haplotypes. For generating high-quality read alignments, we developed a novel method, called ngshmmalign, based on profile hidden Markov models and tailored to small and highly diverse viral genomes. V-pipe also includes benchmarking functionality providing a standardized environment for comparative evaluations of different pipeline configurations. We demonstrate this capability by assessing the impact of three different read aligners (Bowtie 2, BWA MEM, ngshmmalign) and two different variant callers (LoFreq, ShoRAH) on the performance of calling single-nucleotide variants in intra-host virus populations. V-pipe supports various pipeline configurations and is implemented in a modular fashion to facilitate adaptations to the continuously changing technology landscape. V-pipe is freely available at https://github.com/cbg-ethz/V-pipe

    V-pipe: a computational pipeline for assessing viral genetic diversity from high-throughput data

    No full text
    Motivation High-throughput sequencing technologies are used increasingly not only in viral genomics research but also in clinical surveillance and diagnostics. These technologies facilitate the assessment of the genetic diversity in intra-host virus populations, which affects transmission, virulence and pathogenesis of viral infections. However, there are two major challenges in analysing viral diversity. First, amplification and sequencing errors confound the identification of true biological variants, and second, the large data volumes represent computational limitations. Results To support viral high-throughput sequencing studies, we developed V-pipe, a bioinformatics pipeline combining various state-of-the-art statistical models and computational tools for automated end-to-end analyses of raw sequencing reads. V-pipe supports quality control, read mapping and alignment, low-frequency mutation calling, and inference of viral haplotypes. For generating high-quality read alignments, we developed a novel method, called ngshmmalign, based on profile hidden Markov models and tailored to small and highly diverse viral genomes. V-pipe also includes benchmarking functionality providing a standardized environment for comparative evaluations of different pipeline configurations. We demonstrate this capability by assessing the impact of three different read aligners (Bowtie 2, BWA MEM, ngshmmalign) and two different variant callers (LoFreq, ShoRAH) on the performance of calling single-nucleotide variants in intra-host virus populations. V-pipe supports various pipeline configurations and is implemented in a modular fashion to facilitate adaptations to the continuously changing technology landscape.ISSN:1367-4803ISSN:1460-205

    Heritability of the HIV-1 reservoir size and decay under long-term suppressive ART

    No full text
    The HIV-1 reservoir is the major hurdle to curing HIV-1. However, the impact of the viral genome on the HIV-1 reservoir, i.e. its heritability, remains unknown. We investigate the heritability of the HIV-1 reservoir size and its long-term decay by analyzing the distribution of those traits on viral phylogenies from both partial-pol and viral near full-length genome sequences. We use a unique nationwide cohort of 610 well-characterized HIV-1 subtype-B infected individuals on suppressive ART for a median of 5.4 years. We find that a moderate but significant fraction of the HIV-1 reservoir size 1.5 years after the initiation of ART is explained by genetic factors. At the same time, we find more tentative evidence for the heritability of the long-term HIV-1 reservoir decay. Our findings indicate that viral genetic factors contribute to the HIV-1 reservoir size and hence the infecting HIV-1 strain may affect individual patients’ hurdle towards a cure.ISSN:2041-172

    Comparing mutational pathways to lopinavir resistance in HIV-1 subtypes B versus C

    No full text
    Although combination antiretroviral therapies seem to be effective at controlling HIV-1 infections regardless of the viral subtype, there is increasing evidence for subtype-specific drug resistance mutations. The order and rates at which resistance mutations accumulate in different subtypes also remain poorly understood. Most of this knowledge is derived from studies of subtype B genotypes, despite not being the most abundant subtype worldwide. Here, we present a methodology for the comparison of mutational networks in different HIV-1 subtypes, based on Hidden Conjunctive Bayesian Networks (H-CBN), a probabilistic model for inferring mutational networks from cross-sectional genotype data. We introduce a Monte Carlo sampling scheme for learning H-CBN models for a larger number of resistance mutations and develop a statistical test to assess differences in the inferred mutational networks between two groups. We apply this method to infer the temporal progression of mutations conferring resistance to the protease inhibitor lopinavir in a large cross-sectional cohort of HIV-1 subtype C genotypes from South Africa, as well as to a data set of subtype B genotypes obtained from the Stanford HIV Drug Resistance Database and the Swiss HIV Cohort Study. We find strong support for different initial mutational events in the protease, namely at residue 46 in subtype B and at residue 82 in subtype C. The inferred mutational networks for subtype B versus C are significantly different sharing only five constraints on the order of accumulating mutations with mutation at residue 54 as the parental event. The results also suggest that mutations can accumulate along various alternative paths within subtypes, as opposed to a unique total temporal ordering. Beyond HIV drug resistance, the statistical methodology is applicable more generally for the comparison of inferred mutational networks between any two groups.ISSN:1553-734XISSN:1553-735

    Viral Diversity Based on Next-Generation Sequencing of HIV-1 Provides Precise Estimates of Infection Recency and Time Since Infection

    Get PDF
    Background: Human immunodeficiency virus type 1 (HIV-1) genetic diversity increases over the course of infection and can be used to infer the time since infection and, consequently, infection recency, which are crucial for HIV-1 surveillance and the understanding of viral pathogenesis. Methods: We considered 313 HIV-infected individuals for whom reliable estimates of infection dates and next-generation sequencing (NGS)–derived nucleotide frequency data were available. Fractions of ambiguous nucleotides, obtained by population sequencing, were available for 207 samples. We assessed whether the average pairwise diversity calculated using NGS sequences provided a more exact prediction of the time since infection and classification of infection recency (<1 year after infection), compared with the fraction of ambiguous nucleotides. Results: NGS-derived average pairwise diversity classified an infection as recent with a sensitivity of 88% and a specificity of 85%. When considering only the 207 samples for which fractions of ambiguous nucleotides were available, the NGS-derived average pairwise diversity exhibited a higher sensitivity (90% vs 78%) and specificity (95% vs 67%) than the fraction of ambiguous nucleotides. Additionally, the average pairwise diversity could be used to estimate the time since infection with a mean absolute error of 0.84 years, compared with 1.03 years for the fraction of ambiguous nucleotides. Conclusions: Viral diversity based on NGS data is more precise than that based on population sequencing in its ability to predict infection recency and provides an estimated time since infection that has a mean absolute error of <1 year.ISSN:0022-1899ISSN:1537-661

    Within-patient genetic diversity of SARS-CoV-2

    No full text
    SARS-CoV-2, the virus responsible for the current COVID-19 pandemic, is evolving into different genetic variants by accumulating mutations as it spreads globally. In addition to this diversity of consensus genomes across patients, RNA viruses can also display genetic diversity within individual hosts, and co-existing viral variants may affect disease progression and the success of medical interventions. To systematically examine the intra-patient genetic diversity of SARS-CoV-2, we processed a large cohort of 3939 publicly-available deeply sequenced genomes with specialised bioinformatics software, along with 749 recently sequenced samples from Switzerland. We found that the distribution of diversity across patients and across genomic loci is very unbalanced with a minority of hosts and positions accounting for much of the diversity. For example, the D614G variant in the Spike gene, which is present in the consensus sequences of 67.4% of patients, is also highly diverse within hosts, with 29.7% of the public cohort being affected by this coexistence and exhibiting different variants. We also investigated the impact of several technical and epidemiological parameters on genetic heterogeneity and found that age, which is known to be correlated with poor disease outcomes, is a significant predictor of viral genetic diversity

    Quantifying SARS-CoV-2 spread in Switzerland based on genomic sequencing data

    No full text
    Pathogen genomes provide insights into their evolution and epidemic spread. We sequenced 1,439 SARS-CoV-2 genomes from Switzerland, representing 3-7% of all confirmed cases per week. Using these data, we demonstrate that no one lineage became dominant, pointing against evolution towards general lower virulence. On an epidemiological level, we report no evidence of cryptic transmission before the first confirmed case. We find many early viral introductions from Germany, France, and Italy and many recent introductions from Germany and France. Over the summer, we quantify the number of non-traceable infections stemming from introductions, quantify the effective reproductive number, and estimate the degree of undersampling. Our framework can be applied to quantify evolution and epidemiology in other locations or for other pathogens based on genomic data

    Global disparities in SARS-CoV-2 genomic surveillance

    No full text
    Genomic sequencing is essential to track the evolution and spread of SARS-CoV-2, optimize molecular tests, treatments, vaccines, and guide public health responses. To investigate the global SARS-CoV-2 genomic surveillance, we used sequences shared via GISAID to estimate the impact of sequencing intensity and turnaround times on variant detection in 189 countries. In the first two years of the pandemic, 78% of high-income countries sequenced >0.5% of their COVID-19 cases, while 42% of low- and middle-income countries reached that mark. Around 25% of the genomes from high income countries were submitted within 21 days, a pattern observed in 5% of the genomes from low- and middle-income countries. We found that sequencing around 0.5% of the cases, with a turnaround time <21 days, could provide a benchmark for SARS-CoV-2 genomic surveillance. Socioeconomic inequalities undermine the global pandemic preparedness, and efforts must be made to support low- and middle-income countries improve their local sequencing capacity.ISSN:2041-172
    corecore